Exercise

Load and preprocess the adult data as before. include dummy encoding and scaling Learn a logistic regression model and visualize the coefficients. Then grid-search the regularization parameter C. compare L1 penalty to L2 penalty. how are the coefficients different? which are the most important features?



In [ ]:

    
import pandas as pd
# The file has no headers naming the columns, so we pass header=None
# and provide the column names explicitly in "names"
data = pd.read_csv(
    "adult.data", header=None, index_col=False,
    names=['age', 'workclass', 'fnlwgt', 'education',  'education-num',
           'marital-status', 'occupation', 'relationship', 'race', 'gender',
           'capital-gain', 'capital-loss', 'hours-per-week', 'native-country',
           'income'])
# this column is somewhat meaningless in this context
data = data.drop("fnlwgt", axis=1)
data.head()